How to Capture the Zeitgeist via Eigen-tweets?
A Short Answer Arguably, the aforementioned enormous daily load of tweets largely encapsulates the worldwide emerging trends and breaking news. Therefore, in the past few years, tweet trend-clustering and geo-temporal trend tracking (a.k.a. zeitgeist tracking) has attracted considerable interest, with a plethora of online applications B. Parr (2009, Apr. 29). "Swine flu's tweet tweet causes online flutter," ''Business Standard'' [Online]. Available: http://mashabble.com/2009/04/04/twitter-trends performing data (tweet message) and metadata (tweet author, location, number of re-tweets, etc.) search queries on the open tweet database. Modern popular techniques are either (i) singling out the most popular tweets or (ii) delivering in single words (or hashtags) the pivotal points of emerging trends and breaking news. Although existing approaches seem to largely capture the ``spirit of the time", they suffer from some major drawbacks. On the one hand, the ``popular tweets" approach fails to accomplish ''complete'' trend identification, since the most popular tweets may span only a small subset of the emerging trends, while, on the other hand, single-word trend titling lacks in concept ''descriptiveness''. Following motivational Example 1 highlights these drawbacks. Example 1 In Table I, lies a pool of 8 tweets, available for processing. We consider these tweets to have been posted simultaneously in [http://en.wikipedia.org/wiki/Greece Greece] and to be, other than geographically, metadata-wise independent. In the second column, as a measure of popularity, we append the aggregate number of re-tweets, favors, and replies. Say that we are interested in extracting 3 objects (let them be words, phrases or tweets) that encapsulate the major emerging trends and/or breaking news and are homogeneous; i.e., an object cannot contain information for more than one trend or piece of news. Certainly, the first popular tweets method would return items 1, 4, and 8, capturing only tweets referring to the recent financial developments in [https://en.wikipedia.org/wiki/Cyprus Cyprus] . On the other hand, the single-word trend titling, in its simplest variant, would return the words ``Greek", ``Cyprus", and ``bailout", failing to offer a comprehensive depiction of the trends. State-of-the-art research on large-scale [http://en.wikipedia.org/wiki/Sparse_PCA sparse principal component analysis (PCA)] D. S. Papailiopoulos, A. G. Dimakis, S. Korokythakis, "Sparse PCA through low-rank approximations," ''arXiv:1303.055lvl [stat.ML]'', Mar. 2013. introduces the concept of Eigen-tweet (ET) basis. An ET is a tweet-like collection of key-words put together to comprehensively describe the most important piece of news or trend. Respectively, ET basis is a set of ET's that are content-wise orthogonal to each other and span/capture the most important part of the zeitgeist similarly to the way the eigen-basis spans the range of a matrix. Although based on a simple concept, the Eigen-tweet basis extraction can be quite a challenging task (or even intractable) in terms of complexity, considering that meaningful results can only be derived operating on enormous tweet data sets. Interestingly, authors in , M. Asteris, D. S. Papailiopoulos, and G. N. Karystinos, ``Sparse principal component of a rank-deﬁcient matrix," in ''Proc. IEEE Intern. Symp. Inf. Theory'', St. Petersburg, Russia, Aug. 2011, pp. 673-677. have delivered novel techniques for efficient large scale sparse PCA, that, applied on huge tweet data set can deliver its principal components practically fast. In the sequel, we formally introduce the Eigen-tweets concept. A Long Answer Problem Statement Let a collection of N tweet messages t_1, t_2, \cdots comprise the available tweet data-set \mathcal T=\{ t_1, t_2, \cdots, t_N\} \ \ \ \ \ \ \ \ \ \ (1) with cardinality |\mathcal T|=N . Following the bag-of-words model W. Cohen, ``Bag of words and word positions," in ''Advances in inductive logic programming'', Amsterdam, NL: IOS Press, 1995, pp. 124--143., we build our dictionary \mathcal D=\{w_1, w_2, \cdots, w_M\} \ \ \ \ \ \ \ \ \ \ (2) as a selection of M non-trivial (pronouns, proverbs, pro-adverbs, pro-adjectives, etc. are considered to be trivial) preferably grammatically uncorrelated words appearing in \mathcal T . Certainly, for the problem at hand, we operate in the intersection of high-dimensionality and big-data regimes and N and M are large numbers, at maybe in the order of tens of thousands. Then, we build the boolean mapping matrix \mathbf S \in \{0,1\}^{M \times N} such that [\mathbf S]_{i,j}=1 \Leftrightarrow w_i appears in x_j. \ \ \ \ \ \ \ \ \ \ (3) Importantly, since each tweet constitutes of 140 characters, with the English words being 4.5 characters long on average R. J. Pierce, ''An introduction to information theory: Symbols, signals and noise'', 2nd ed. New York, NY: Dover Publications, 1980., the columns of \mathbf S will be sparse. Next, we define the M \times M nonnegative word-correlation matrix \mathbf A_w ~~ \overset{\triangle}{=} ~~ \mathbf S \mathbf S^T\ \ \ \ \ \ \ \ \ \ (4) so that [\mathbf A_w]_{i,j} \leq \min_{i,j} \| [\mathbf S]_{i,:}^T \|_1 counts the number of tweets in \mathcal T in which w_i and w_j coexist. In a large-scale weighted graph interpratation, if each word in the dictionary is represented by a vertix and two vertices are connected with an edge weighted by the number of tweets in which these two words coexist, then \mathbf A_w is the graph's (weighted) adjacency matrix. Notice that the N \times N tweet-correlation matrix \mathbf A_t \overset{\triangle}{=} \mathbf S^T\mathbf S is such that [\mathbf A_t]_{i,j} counts the number of common words in t_i and t_j . Vanilla PCA Eigen-tweets Simple principal eigen-tweet extraction could be carried out by means of ``explained variance" maximization \mathcal P: ~~~~ \begin{array}{l} \underset{\mathbf x\in \mathbb{R}^{M \times M}}{\mathrm{maximize}} ~~\mathbf x^T \mathbf A_w \mathbf x \\ \mathrm{subject~ to} ~\| \mathbf x\|_2 =1 \end{array} \ \ \ \ \ \ \ \ \ \ (5) also known as the ``Vanilla PCA" problem. Then, denoting by \mathcal I^* the set of the indices of the positive entries of \mathbf x^* , |\mathcal I^*| = \| \mathbf x^* \|_0 , with \| \cdot \|_0 denoting the l_0 -norm of a vector --counting the number of non-zero entries, we form the eigen-tweet t^* = \bigcup_{i \in \mathcal I^*} w_i. \ \ \ \ \ \ \ \ \ \ (6) The rationale behind this eigen-tweet construction is that [\mathbf x^*]_i will have a great value if is w_i popular and, therefore \|[\mathbf A]_{:,i}\|_1 large. Certainly, since there is no guarantee that \mathbf x^* is sparse, t^* may be too long and hard to interpret --hence, inappropriate for an eigen-tweet. This simple observation motivates the pursue of a '''sparse''' principal component of the adjacency matrix \mathbf A_w . Sparse PCA Eigen-tweets An alternative problem formulation for the derivation of the principal eigen-tweet, conforming with the risen sparsity demand, is \mathcal P_s: ~~~~ \begin{array}{l} \underset{\mathbf x\in \mathbb{R}^{M \times M}}{\mathrm{maximize}} ~~\mathbf x^T \mathbf A_w \mathbf x \\ \mathrm{subject~ to} ~\| \mathbf x\|_2 =1~~ \mathrm{ and }~~ \| \mathbf x\|_0 = k \end{array} \ \ \ \ \ \ \ \ \ \ (7) where k is the number of words comprising the eigen-tweet and \mathcal P_s is the well-known sparse principal-component analysis problem A. d'Aspremont, L. El Ghaoui, M. Jordan, and G. Lanckriet, ``A direct formulation of sparse PCA using semideﬁnite programming," ''SIAM Review'', 49(3), 2007., A. A. Amini and M. Wainwright, ``High-dimensional analysis of semideﬁnite relaxations for sparse principal components," ''The Annals of Statistics'', 37(5B):2877-2921, 2009., B. Moghaddam, Y. Weiss, and S. Avidan, ``Generalized spectral bounds for sparse lda," in ''Proceedings of the 23rd international conference on Machine learning'', pp. 641-648, 2006. , B. Moghaddam, Y. Weiss, and S. Avidan, ``Spectral bounds for sparse PCA: exact and greedy algorithms," ''Advances in Neural Information Processing Systems'', 2006, p. 18., A. d’Aspremont, F. Bach, and L. El Ghaoui ``Optimal solutions for sparse principal component analysis," ''Journal of Machine Learning Research'', 2008, pp. 1269-1294., H. Zou, T. Hastie, and R. Tibshirani, ``Sparse principal component analysis," ''Journal of Computational and Graphical Statistics'', 2006, pp. 265-286., H. Shen and J. Z. Huang , ``Sparse principal component analysis via regularized low rank matrix approximation," ''J. Multivar. Anal.'', July 2008, pp. 1015-1034.. The l_0 cardinality constraint limits the optimization over vectors with k non-zero entries and (7) is known to be [http://en.wikipedia.org/wiki/NP-hard NP-hard] -having (\begin{array}{c} M \\ k\end{array}) candidate solutions- and, thus, computationally intractable in general. However, recent algorithmic developments on sparse PCA in show the way for tractable solution approximation to \mathcal P_s , with proven performance degradation bounds, and present the derived results in the context of zeitgeist-recording in large tweet data-sets. The state-of-the-art algorithm developed in , baring the name ''Spannogram'', is briefly presented below. Efficient Sparse PCA and Eigen-tweet Extraction via ''Spannogram'' According to , the NP-hard solution to \mathcal P_{s} can be efficiently approximated -with bounded performance degradation- in the three following steps. #Instead of \mathbf A_{w} , use in (5) its "best" rank- d approximation \mathbf A_{w}^{(d)} , derived by means of conventional singular-value decomposition (SVD). #Use \mathbf A_{w}^{(d)} to obtain \mathcal O( M^{d}) candidate sparse principal components (Spannogram). # Browse the size- \mathcal O( M^{d}) candidate support and make a maximum-explained-variance choice. Low-rank Approximation Observe that \rho ~~ \overset{\triangle}{=} ~~ \mathrm{ rank}(\mathbf A_{w}) \leqslant \mathrm{min}(M,N) and let some d of interest such that d \leqslant \rho . The problem of finding the rank- d approximation of \mathbf A_w is formally \mathcal P_{LR}: ~~~~ \begin{array}{l} \underset{\mathbf D\in \mathbb{R}^{M \times M}}{\mathrm{minimize}} ~~\mathbf \| \mathbf \mathbf A_{w} - \mathbf D \|_F \\ \mathrm{subject~ to} ~~ \mathrm{rank}(\mathbf D)~=~d \end{array} \ \ \ \ \ \ \ \ \ \ (8) The analytical solution to \mathcal P_{LR} comes in terms of the singular value decomposition of \mathbf S . Specifically, let \mathbf S admit SVD \mathbf S~=~\mathbf U \mathbf \Sigma \mathbf V^T, \ \ \ \ \ \ \ \ \ \ (9) where \mathbf U^T \mathbf U~=~\mathbf I_\rho , \mathbf V^T \mathbf V~=~\mathbf I_\rho and \mathbf \Sigma is the positive diagonal matrix containing the singular values of \mathbf S . Then, if \mathbf \Lambda^{(d)} ~=~ ( \left[\mathbf \Sigma\right]_{1:d,1:d})^2 and \mathbf U^{(d)}~=\left[\mathbf U \right]_{:,1:d} , the solution to \mathcal P_{LR} is C. Eckart and G. Young, "The approximation of one matrix by another of lower rank," ''Psychometrika'', vol. 1, pp. 211-218, 1936. \mathbf A_{w}^{(d)}~=~\mathbf U^{(d)}\mathbf \Lambda^{(d)}\mathbf U^{(d)T}. \ \ \ \ \ \ \ \ \ \ (10) The ''Spannogram'' Algorithm ( d = 1,2 ) In the sequel, for simplicity purposes, we present the ''Spannogram'' algorithm for the special cases where d=1 and d=2. Further details can be found in . Rank-1 Case We observe that \mathbf A_{w}^{(1)}~=~\mathbf U^{(1)}\mathbf \Lambda^{(1)} \mathbf U^{(1)T} , with \mathbf U^{(1)} ~=~ \mathbf u \in {\mathbb R}^{M \times 1} , \| \mathbf u \|_2 ~ =~1 , and \mathbf \Lambda^{(1)} ~=~\lambda \in {\mathbb R}_{++} . Then, for any \mathbf x \in {\mathbb R}^{M \times 1} , \mathbf x^{T} \mathbf A_{w}^{(1)} \mathbf x ~=~ \lambda \mathbf x^{T} \mathbf u \mathbf u^{T} \mathbf x ~=~ \lambda (\mathbf u^{T} \mathbf x)^2. \ \ \ \ \ \ \ \ \ \ (11) The solution to \begin{array}{l} \underset{\mathbf x \in \mathcal{Z} _{k}}{\mathrm{maximize}} ~~\mathbf \lambda (\mathbf u^{T} \mathbf x)^2 \end{array} \ \ \ \ \ \ \ \ \ \ (12) where \mathcal{Z}_{k} ~=~ \left \{ \mathbf x \in {\mathbb R}^{M \times 1} : \| \mathbf x \|_2 = 1, \|\mathbf x\|_0 = k \right \} , is simply \mathbf x_{\mathrm{opt}} = \frac {\mathbf z} {\|\mathbf z\|_2} , where \left [\mathbf z \right ]_i = \left\{\begin{matrix} \left [ \mathbf u \right ]_i, & \mathrm{if}~i \in \mathcal{I}_k(\mathbf u) \\ 0,& \mathrm{otherwise} \end{matrix}\right. i=1,2, \cdots, M , and, for any \mathbf x \in {\mathbb R}^{M \times 1},~ \mathcal{I}_k(\mathbf x) is the set containing the indices of the k largest (in the absolute sense) entries of \mathbf x . Rank-2 Case We observe that \mathbf A_{w}^{(2)}~=~\mathbf U^{(2)}\mathbf \Lambda^{(2)} \mathbf U^{(2)T} , with \mathbf U^{(2)} ~=~ \left [ \mathbf u_1,~ \mathbf u_2 \right ] \in {\mathbb R}^{M \times 2} , \| \mathbf u_1 \|_2 ~=~ \| \mathbf u_2 \|_2 ~ =~1 , \mathbf u_1^{T} \mathbf u_2 ~=~ 0 , and \mathbf \Lambda^{(2)} ~=~\begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix} \in {\mathbb R}_{++}^{2 \times 2} . If we define \mathbf V=[\sqrt{\lambda_1} \mathbf u_1,~\sqrt{\lambda_2} \mathbf u_2] , then for any \mathbf x \in {\mathbb R}^{M \times 1} , \mathbf x^{T} \mathbf A_{w}^{(2)} \mathbf x ~=~ \| \mathbf V^{T} \mathbf x \|_2^{2}. \ \ \ \ \ \ \ \ \ \ (13) In view of the Cauchy-Schwarz inequality, \|\mathbf V^T \mathbf x \|_2^2 ~=~ \underset{\| \mathbf c \|_2 = 1} \mathrm{max} (\mathbf c^T \mathbf V^T \mathbf x)^2 and \underset{\mathbf x \in \mathcal{Z}_k}{\mathrm{max}}~~ \mathbf x^T \mathbf A_w^{(2)} \mathbf x ~=~ \underset{\mathbf x \in \mathcal{Z}_k}{\mathrm{max}}~~\underset{\mathbf c \in \mathcal{S}_2}{ \mathrm{max}}~~ (\mathbf c^T \mathbf V^T \mathbf x)^2 ~=~ \underset{\mathbf c \in \mathcal{S}_2}{\mathrm{max}}~~\underset{ \mathbf x \in \mathcal{Z}_k} {\mathrm{max}}~~ (\mathbf c^T \mathbf V^T \mathbf x)^2 \ \ \ \ \ \ \ \ \ \ (14) where \mathcal{S}_L is the surface of a L -dimensional unit-radius [http://en.wikipedia.org/wiki/N-sphere hypersphere] (that is, \mathcal{S}_L ~=~ \left \{ \mathbf x \in {\mathbb R}^{L} : \|\mathbf x \|_2 = 1 \right \} ) . Since any point \mathbf c on the perimeter of a unit-radius circle (i.e., the surface of a 2 -dimensional unit-radius sphere) is equivalent to -\mathbf c in terms of the maximization argument in (14), it suffices to restrict ourselves to half-circle browsing and express \mathbf c in terms of a unique angle \phi \in ( - \pi /2 , \pi /2 ] as \mathbf c ~=~ \begin{bmatrix} \mathrm{ sin}( \phi) \\ \mathrm{ cos} (\phi) \end{bmatrix}. \ \ \ \ \ \ \ \ \ \ (15) Therefore, \underset{\mathbf x \in \mathcal{Z}_k} \mathrm{max}~ \mathbf x^T \mathbf A_w^{(2)} \mathbf x ~=~ \underset{\phi \in ( - \pi /2 , \pi /2 ]} \mathrm{max} ~\underset{ \mathbf x \in \mathcal{Z}_k} \mathrm{max} (\mathbf v(\phi)^T \mathbf x)^2 \ \ \ \ \ \ \ \ \ \ (16) where \mathbf v(\phi) = \sqrt{\lambda_1}\mathrm{cos}(\phi)\mathbf u_1 + \sqrt{\lambda_2}\mathrm{sin}(\phi)\mathbf u_2 . Interestingly, for any fixed \phi , the solution to \underset{\mathbf x \in \mathcal{Z}_k} \mathrm{maximize}~ (\mathbf v(\phi)^T \mathbf x)^2, \ \ \ \ \ \ \ \ \ \ (17) is known by the aferomentioned solution analysis of the rank-1 case -see (14). Therefore, \underset{\mathbf x \in \mathcal{Z}_k} \mathrm{arg~max}~ \mathbf x^T \mathbf A_w^{(2)} \mathbf x ~=~ \mathbf x_{\mathrm{opt}}(\mathbf \phi_{opt}) \ \ \ \ \ \ \ \ \ \ (18) where \phi_{opt} = \underset{\phi \in ( - \pi /2 , \pi /2 ]} \mathrm{arg~max} \mathbf x_{\mathrm{opt}}(\phi)^T \mathbf A_w^{(2)} \mathbf x_{\mathrm{opt}}(\phi) and \mathbf x_{\mathrm{opt}}(\phi) is the solution to the problem in (17) for any fixed \phi \in ( - \pi /2 , \pi /2 ] . At first sight, it seems like one should exhaustively browse uncountably infinite angles to find (18). However, quite importantly, it is proven that one suffices to look for \phi_{\mathrm{opt}} in an easily-found size- (4 \binom {M}{2}) set of angles that correspond to different \mathcal{I}_{k}(\mathbf v(\phi)) . Therefore, instead of \binom {M}{k} , there are, in fact, (4 \binom {M} {2}) = O(M^2) candidate solutions to \underset{\mathbf x \in \mathcal{Z}_k} \mathrm{maximize}~ \mathbf x^T A_w^{(2)} \mathbf x. For more information on sparse PCA, we refer the interested reader to , , , , , , , , , , and references therein. Question 1 Consider full-rank matrix \mathbf A \in {\mathbb R}^{N \times N} ( \mathrm{rank}(\mathbf A)=N ) and the symmetric positive-definite matrix \mathbf A^T \mathbf A admitting eigenvalue-decompostion (EVD) \mathbf A^T \mathbf A = \mathbf Q \mathbf \Lambda \mathbf Q^T where \mathbf Q = [\mathbf q_1, \mathbf q_2,..., \mathbf q_N] , \mathbf Q^T \mathbf Q = \mathbf {I}_{N} , and \mathbf \Lambda = \mathrm{diag}([\lambda_1, \lambda_2, \cdots, \lambda_N]^T) with \lambda_1 > \lambda_2 > ...> \lambda_N > 0 . Given some positive k < N , define \mathbf Q_k =[\mathbf q_1, \mathbf q_2, \cdots, \mathbf q_k] , \tilde{\mathbf Q}_k =[\mathbf q_{k+1}, \cdots, \mathbf q_N] , \mathbf P_k ~=~\mathbf Q_k \mathbf Q_k^T , and solve \begin{array}{l} \underset{\mathbf x \in \mathbb{R}^{N}}{\mathrm{maximize}} ~~\mathbf \| \mathbf x^T (\mathbf I_N - \mathbf P_k) \mathbf A \|_2 \\ \mathrm{subject~ to} ~~ \|\mathbf x\|_2~=~1 \end{array} Answer The solution is \mathbf x_{opt} = \mathbf q_{k+1} . Proof: Define \mathbf \Lambda_k = \begin{bmatrix} \lambda_1 & & \\ & \ddots & \lambda_k \end{bmatrix} , and \tilde{ \mathbf \Lambda}_k = \begin{bmatrix} \lambda_{k+1} & & \\ & \ddots & \lambda_N \end{bmatrix} , and observe that \mathbf Q_k^T\mathbf Q_k = \mathbf I_k , \tilde{\mathbf Q}_k^T \tilde{\mathbf Q}_k = \mathbf I_{N-k} , and \tilde{ \mathbf Q}_k^T \mathbf Q_k =\mathbf 0_{N-k \times K} . Then, \mathbf A^T \mathbf A = \mathbf Q \mathbf \Lambda \mathbf Q^T = \mathbf Q_k \mathbf \Lambda_k \mathbf Q_k^T + \tilde{\mathbf Q}_k \tilde{\mathbf \Lambda_k} \tilde{\mathbf Q_k}^T . By definition, \mathbf P_k = \mathbf Q_k \mathbf Q_k^T and, since \mathbf Q^T \mathbf Q= \mathbf I_{N} , \mathbf I_N - \mathbf P_k = \tilde{\mathbf Q}_k \tilde{\mathbf Q}_k^T . Therefore, (\mathbf I_N- \mathbf P_k)\mathbf A^T\mathbf A(\mathbf I_N- \mathbf P_k) = \tilde{\mathbf Q}_k \tilde{\mathbf Q}_k^T (\mathbf Q_k \mathbf \Lambda_k \mathbf Q_k^T + \tilde{\mathbf Q}_k \tilde{\mathbf \Lambda}_k \tilde{\mathbf Q}_k^T) \tilde{\mathbf Q}_k \tilde{ \mathbf Q}_k = \tilde{\mathbf Q}_k \tilde{\mathbf \Lambda}_k \tilde{\mathbf Q}_k^T . Evidently, the solution to the problem at hand and, thus, to \begin{array}{l} \underset{\mathbf x \in \mathbb{R}^{N \times 1}}{\mathrm{maximize}} ~~\mathbf x^T \tilde{\mathbf Q}_k \tilde{\mathbf \Lambda}_k \tilde{\mathbf Q}_k^T \mathbf x \\ \mathrm{subject~ to} ~~ \|\mathbf x\|_2~=~1 \end{array} equals \mathbf q_{k+1} , since \lambda_{k+1} > \lambda_{k+2} > ... > \lambda_{N} > 0 . Question 2 Consider rank- M matrix \mathbf A \in {\mathbb R}^{M \times M} and let \mathcal{I}_K be a given subset of \{1,2, \cdots ,M\} with cardinality |\mathcal I_{K}|=K < M and M \times M matrix \mathbf P , such that, \forall i,j \in \{1, 2, \cdots, M\} , [\mathbf P]_{i,j} ~=~ \left\{\begin{matrix} 1,& \mathrm{if}~ i=j \in \mathcal{I}_K\\ 0,&\mathrm{otherwise} \end{matrix} \right. . Assume now that, for any rank- N matrix \mathbf B \in {\mathbb R}^{N \times N} , the EVD solution to \begin{array}{l} \underset{\mathbf x \in \mathbb{R}^{N \times 1}}{\mathrm{maximize}} ~~\mathbf x^T \mathbf B \mathbf B^T \mathbf x \\ \mathrm{subject~ to} ~~ \|\mathbf x\|_2~=~1 \end{array} can be obtained for no less than $N . Describe how one could use EVD to solve \mathcal Q_2:~~~~ \begin{array}{l} \underset{\mathbf x \in \mathbb{R}^{M \times 1}}{\mathrm{maximize}} ~~\mathbf x^T \mathbf P \mathbf A \mathbf A^T \mathbf P \mathbf x \\ \mathrm{subject~ to} ~~ \|\mathbf x\|_2~=~1 \end{array} expending less than $(K+1) . Answer Define \tilde{\mathbf B} = \mathbf B_{\mathcal{I}_K,:} (i.e., the matrix built only by the rows of \mathbf B with index in \mathcal{I}_K ). Then, solve \tilde{\mathbf x}_{opt} ~~ = ~~ \underset{\mathbf x \in \mathbb{R}^{|\mathcal I_K| \times 1}, \|\mathbf x\|_2~=~1}{\mathrm{argmax}} ~~\mathbf x^T \tilde{\mathbf B} \tilde{\mathbf B}^T \mathbf x. Finally, the solution to \mathcal Q_2 can be obtained in two simple steps: \begin{array}{l} (s.1)~~ \mathbf x_{opt} \leftarrow \mathbf 0_{N \times 1}, \\ (s.2)~~ [\mathbf x_{opt}]_{\mathcal{I}_K} \leftarrow \tilde{\mathbf x}_{opt}. \end{array} References